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L  Introduction 

The  principle  of  global  parallelism  in  parallel  programming  was  introduced  by  Jor- 
dan[l],  through  a  set  of  FORTRAN  macros  called  the  Force  macros.  These  macros  sup¬ 
port  the  construction  of  programs  to  be  executed  in  parallel  by  a  Force  of  processes.  The 
number  of  processes  is  left  unspecified  at  compile  time,  but  is  potentially  quite  large. 
The  Force  provides  a  FORTRAN  style  parallel  programming  language  utilizing  an  exten¬ 
sive  set  of  parallel  constructs.  The  programmer,  insulated  from  process  management,  is 
left  free  to  concentrate  on  the  synchronization  issues  of  parallel  programming. 

A  Force  module,  i.e.,  a  main  program  or  subroutine,  consists  of  regular  FORTRAN 
77  statements  that  will  be  executed  by  all  processes  from  the  first  line  of  the  program  list¬ 
ing,  unless  limited  by  a  process  synchronization  construct.  Macros  in  the  Force  support 
parallel  execution  of  DO  loops  using  pre-scheduled  and  self-scheduled  algorithms.  The 
Force  includes  constructs  to  allow  for  mutual  exclusion,  synchronization,  and/or  sequen¬ 
tial  execution  when  necessary,  and  constructs  for  data  based  control  of  execution. 

A  key  feature  of  the  Force  is  its  management  of  variables  in  an  MIMD  environment. 
The  Force  maintains  six  classes  of  variables.  Each  class  in  turn  supports  all  the  standard 
FORTRAN  variable  types:  INTEGER,  REAL,  COMPLEX,  etc.  The  parallelism  class  of 
a  Force  variable  determines  how  it  is  accessed  by  different  processes  and  may  be  Private, 
Shared,  or  Async.  Each  of  these  three  classes  will  also  inherit  from  FORTRAN  the 
storage  class  of  COMMON  among  program  modules  or  local  to  one  module,  yielding  six 
classes.  Private  variables  have  separate  instantiations  for  each  component  process  of  the 
Force.  Shared  variables  have  only  a  single  instantiation  and  are  accessible  by  all 
processes  of  the  Force.  Async,  or  "asynchronous,"  variables  have  a  "full/empty"  state 
associated  with  them,  and  are  shared  between  processes  as  well.  Interprocess  communi¬ 
cation  is  achieved  through  use  of  Shared  or  Async  variables.  The  FORTRAN  COMMON 
mechanism  is  used  to  implement  Force  COMMON.  The  Force  variable  declarations  are 
meant  to  supersede  FORTRAN  variable  declarations.  However,  ordinary  FORTRAN 
declarations  will  normally  be  treated  as  Private,  so  that  sequential  FORTRAN  modules 
may  be  called  from  Force  modules. 

This  manual  will  describe  the  Force  constructs  in  detail.  Force  constructs  are 
divided  into  four  categories:  program  structure,  declaration  of  variables,  parallel  execu¬ 
tion,  and  synchronization.  The  programmer  using  the  Force  writ  >  a  program  that  is  to 
be  executed  simultaneously  by  an  arbitrary  number  of  processes.  This  number  is  a  run¬ 
time  parameter.  The  program  may  consist  of  many  Force  modules.  A  Force  module  is 
analogous  to  a  Fortran  main  program  or  subroutine,  except  that  a  Force  module  is  called 
and  executed  by  all  of  the  processes.  The  Force  constructs  are  summarized  in  TABLE-I. 
Triangular  brackets,  <  >,  are  used  to  indicate  required  parameters;  square  brackets,  [  ], 
are  used  to  indicate  optional  parameters.  An  example  of  a  complete  Force  program  is 
shown  later  in  this  manual. 

,  o;. 
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TABLE-I  Force  Program  Constructs 

Program  Structure: 

Force  <name>  of  <#  of  procs>  ident  <proc  id> 

<  declaration  of  variables  > 

[  Extemf  <  Force  module  name  >  ] 

End  declarations 

<Force  program> 

Join 

END 


Forcecall  <name>([para meters]) 


Forcesub  <name>([parameters])  of  <#  of  procs>  ident  <proc  id> 

<  declarations  > 

[  Extemf  <  Force  module  name  >  ] 

End  declarations 

<  subroutine  body  > 

RETURN 

END 

Declaration  of  Variables: 

Private  <FORTRAN  typo  <variable  list> 

Private  Common  /<label>/  <FORTRAN  typo  <variable  list> 

Shared  <FORTRAN  typo  <variable  list> 

Shared  Common  /<label>/  <FORTRAN  typo  <variable  list> 

Async  <FORTRAN  typo  <variable  list> 

Async  Common  /<label>/  <FORTRAN  typo  cvariable  list> 


TABLE-I  Force  Program  Constructs  (continued) 


Parallel  Execution: 

Pease  on  <variable> 
ccode  block> 

[Usect] 

[Csect  (<condition>)] 

End  pease 

Scase 

[Csect  (<condition>)] 

<code  block> 

[Usect] 

End  scase 

Presched  Do  <n>  <var>  =  <il>,<i2>[,<i3>] 

<loop  body> 

<n>  End  Presched  Do 

Selfsched  Do  <n>  <var>  =  <il>,<i2>[,<i3>] 

<loop  body> 

<n>  End  Selfsched  Do 

Pre2do  <n>  <varl>=<il>,<i2>[,<i3>];  <var2>=<jl>,<j2>[,<j3>] 
<doubly  indexed  loop  body> 

<n>  End  Presched  Do 

Self2do  <n>  <varl >=<i  1  >,<i2>[,<i3>] ;  <var2>=<jl>,<j2>[,<j3>] 
<doubly  indexed  loop  body> 

<n>  End  Selfsched  Do 

Askfor  Do  <n>  Init :  <i> 

<loop  body> 

Critical  <var> 

More  work  <j> 

<put  work  in  data  structure> 

End  critical 
<loop  body> 

<n>  End  Askfor  Do 


Synchronization: 

Barrier 

<  code  block  > 

End  barrier 

Critical  <lock-var> 

<  code  block  > 

End  critical 

Void  <async  variable> 

Produce  <async  variable>  =  <expression> 
Consume  <async  variable>  into  <variable> 
Copy  <async  variable>  into  <variable> 

...  Isfull(  <async  variable>  ) ... 


n.  Description  of  the  Force  Macros 

The  macros  are  divided  into  four  groups:  program  structure,  variable  declaration, 
parallel  execution,  and  synchronization.  The  user  of  the  Force  macros  writes  a  single 
parallel  main  program,  zero  or  more  parallel  Force  subroutines,  and  zero  or  more  single 
stream  subroutines  to  be  executed  by  a  single  process.  When  writing  the  parallel  main 
program  and  parallel  subroutines,  the  macros  given  in  the  previous  table  and  described 
below  may  be  used.  The  single  stream  subroutines  and  all  of  the  code  except  the  macros 
in  the  parallel  routines  are  in  FORTRAN  77  and  familiarity  with  that  language  is 
assumed. 

The  number  of  processes  executing  a  Force  program  is  a  parameter  that  the  user 
will  supply  at  run  time.  What  actually  happens  is  that  execution  of  a  Force  program 
begins  with  a  "driver"  routine.  The  driver  will  determine  the  number  of  processes,  create 
these  processes,  all  of  which  will  then  transfer  control  to  the  user  main  program.  This 
procedure  is  invisible  to  the  user  and  programmer. 

Two  terms  are  used  when  referring  to  the  parallel  execution  macros.  These  terms 
are  "pre-scheduling"  and  "self-scheduling."  Pre- scheduling  refers  to  a  division  of  labor 
(usually  based  on  the  local  process  index)  that  is  fixed  at  compile  time  and  independent 
of  the  actual  work  being  done.  Self-scheduling  refers  to  a  dynamic,  run  time  allocation 
of  work  to  processes.  Self- scheduling  is  more  sophisticated,  and  regulates  the  work  load 
better,  but  it  requires  greater  overhead. 

We  have  adopted  the  following  convention:  the  first  Force  keyword  to  appear  on  a 
line  must  have  the  first  letter  capitalized  with  the  remaining  letters  in  lower  case.  Addi¬ 
tional  keywords  on  the  same  line  are  case  insensitive.  For  example.  Barrier  would  be 
recognized  by  the  Force  preprocessor,  but  barrier  or  BARRIER  would  not.  A  pattern 
matching  preprocessor  is  used,  and  this  convention  makes  confusion  between  Force  key¬ 
words  and  FORTRAN  variable  names  less  likely. 

Syntactically,  the  Force  macros  adhere  to  FORTRAN  standards  and  may  be  contin¬ 
ued  on  two  or  more  lines.  A  few  differences  between  the  Force  macros  and  standard 
FORTRAN  syntax  exist;  these  will  be  given  later  in  the  restrictions  section. 


II  A.  Macros  Specifying  Program  Structure 
Force 

The  Force  macro  declares  the  start  of  a  parallel  main  program  and  has  the  following 
syntax: 

Force  <name>  of  <nproc>  ident  <me> 

The  Force  statement  sets  up  the  parallel  environment.  All  processes  begin  execu¬ 
tion  from  this  point  on,  until  they  are  terminated  by  the  Join  statement.  <nproc> 
and  <me>  are  both  user  named  integer  variables,  with  <nproc>  containing  the 
number  of  processes  in  the  Force,  and  <me>  containing  a  unique  identifier  for  each 
process  (between  1  and  <nproc>).  <nproc>  and  <me>  will  be  declared  automati¬ 
cally.  Values  are  assigned  automatically  to  <nproc>  and  <me>,  but  these  values 
must  not  be  changed  by  the  user  program. 

The  Force  main  program  ends  with  a  Join  statement  usually  followed  by  the  FOR¬ 
TRAN  END  statement.  The  Join  statement  terminates  all  but  one  of  the  Force  of 
processes.  This  last  process  will  return  control  to  the  Force  driver  program.  An 
example: 

Force  MYFORCE  of  COUNT  ident  MYINDEX 
<declarations> 

End  declarations 

C  Force  body  with 

C  COUNT  -  is  a  user  named  shared  integer 

C  variable  which  will  receive  the  number 

C  of  processes  executing  the  program. 

C  MYINDEX-  is  a  user  named  private  integer 

C  variable  that  will  be  set  to  a  unique 

C  index  for  each  process,  numbered 

C  between  1  and  COUNT. 

Join 

END 


End  declarations 

This  macro  call  terminates  the  declarations  section  of  a  Force  module  and  begins  its 
executable  code.  It  marks  the  place  to  insert  declarations  generated  automatically  by 
the  macros  and  may  generate  some  executable  code.  End  declarations  must  follow 
the  last  declaration  statement  and  precede  the  first  executable  statement  of  a  Force 
module. 

Some  examples  using  the  End  declarations  macro  are  given  on  the  pages  describing 
the  Force  and  Forcesub  macros.  Please  note,  every  Force  or  Forcesub  statement 


must  have  exactly  one  End  declarations  statement  following  it  at  some  point  in  the 
program  listing  for  that  module. 


Join 

Join  terminates  execution  of  the  parallel  main  program.  It  is  an  executable  state¬ 
ment,  but  is  listed  with  the  macros  determining  program  structure  because  it  is,  in 
some  sense,  the  inverse  of  the  Force  statement.  Instead  of  creating  a  Force  of 
processes,  Join  will  terminate  all  processes  except  the  last  one  to  reach  it.  This  last 
process  returns  to  the  Force  driver  program,  where  it  too  will  be  terminated.  Note, 
the  non-executable  FORTRAN  END  statement  is  still  necessary. 

Forcesub 

The  Forcesub  statement  declares  the  start  of  a  parallel  subroutine  and  has  the  gen¬ 
eral  form: 

Forcesub  <name>(<parameter  list>)  of  <nproc>  ident  <me> 

This  statement  is  roughly  analogous  to  the  Force  statement.  Each  process  will 
maintain  its  local  copy  of  its  process  index,  <me>,  from  the  calling  module;  how¬ 
ever,  this  index  may  be  renamed  in  the  Forcesub  header.  Declarations  including 
Private ,  Private  Common ,  Shared,  Shared  Common,  Async,  or  Async  Common 
statements  may  come  between  a  Forcesub  statement  and  the  following  End  declara¬ 
tions.  There  is  no  special  Force  keyword  to  terminate  a  parallel  subroutine.  The 
Fortran  RETURN  statement  is  used  to  return  control  to  the  calling  module.  The 
arguments  passed  to  a  Forcesub  via  the  parameter  list  should  be  declared  using  only 
normal  FORTRAN  declarations.  Such  arguments  retain  the  parallelism  class, 
Private  or  Shared,  with  which  they  were  defined  in  the  calling  module.  Current 
implementations  do  not  support  Asynchronous  variables  passed  as  parameters.  The 
following  is  an  example  of  a  Forcesub : 


c 


MATRIX  MULTIPLICATION  SUBROUTINE:  C=A*B  — 


Forcesub  MULT(A,B,C,N1,N2,M1)  of  NPROCS  idem  ME 

INTEGER  N1.N2.M1 

REAL  A(N1,N2),  B(N2,M1),  C(N1,M1) 

Private  INTEGER  I,J,K 
End  declarations 

C  Initialize  C  ... 

Pre2do  100 1=  1,N1 ;  J=1,M1 
C(U)  =  0.0 

100  End  presched  do 

C  The  multiplication  process  ... 

Presched  DO  300 1=1, N1 
DO  200  J=1,M1 
DO  200  K=1,N2 

200  C(U)  =  C(I,J)  +  A(I,K)*B(K,J) 

300  End  presched  DO 

RETURN 
END 

This  parallel  subroutine  can  be  called  with  call  statement  as  follows: 

Shared  REAL  A(100,50),  B(50,100),  C(100,100) 

Private  Nl,  N2,  Ml 
End  declarations 


Forcecall  MULT(A,B,C,N1,N2,M1) 


rnf 

The  syntax  of  this  macro  is  as  follows: 

Externf  <Force  module  name  list> 

Extern f  is  used  to  inform  the  Force  compiler/preprocessor  about  external  Forcesub 
modules  that  are  called  using  Forcecall.  "External  modules"  refer  omy  to  Force 
modules  that  are  not  included  in  the  same  file  as  the  Force  main  program.  Modules 
defined  below  the  main  Force  program  (within  the  same  file)  are  not  required  to  be 
declared  Externf.  This  feature  preserves  the  "separate  compilation"  feature  of  the 
FORTRAN  language.  When  a  list  of  external  module  names  is  specified  using 
Externf,  names  in  the  list  should  be  separated  by  commas.  Some  examples: 
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ExtemfINTMAT 
Extemf  INTMAT,  OUTMAT 


The  Extern f  statement  is  placed  in  the  declarations  section  of  a  Force  program. 
Extern/  may  appear  in  any  Force  module  that  has  itself  been  declared  Extern/.  Con¬ 
sider  the  following  example.  Force  modules  A,  B,  and  C  each  appear  in  separate 
files  which  are  perhaps  to  be  compiled  separately.  If  the  Force  main  program.  A, 
calls  Forcesub  B,  which  in  turn  calls  Forcesub  C,  then  A  must  declare  B  using 
Extern/,  and  B  would  declare  C  as  Extern/.  The  point  is  that  as  long  as  C  is  declared 
Extern/  in  B,  which  is  declared  in  A,  then  A  need  not  declare  C  as  Extern/.  Multiple 
declarations,  while  not  required,  are  allowed. 


Forcecall 

The  Forcecall  executable  statement  is  used  to  invoke  parallel  subroutines  that  have 
been  declared  as  named  Forcesub  modules. 

Forcecall  (<parameter  list>) 

The  entire  Force  of  processes  will  jump  to  and  execute  the  parallel  subroutine. 
However,  Forcecall  does  not  cause  synchronization.  Forcecall  differs  from  the  reg¬ 
ular  FORTRAN  CALL  only  in  that  provisions  are  made  to  automatically  pass  the 
local  process  identifier  <me>  for  each  process.  Normal  Fortran  scope  rules  apply  to 
Force  variables.  Note,  Async  variables  may  not  be  included  in  the  parameter  list, 
but  may  be  passed  through  an  Async  Common  block  instead. 


IIB.  Variable  Declarations 

The  implementation  of  the  Force  as  a  preprocessor,  which  does  not  construct  a  sym¬ 
bol  table,  requires  that  all  type  information  be  included  in  the  Private  or  Shared  declara¬ 
tions,  so  that  it  is  available  during  the  preprocessing  of  that  statement.  It  should  be  noted 
that  FORTRAN  IMPLICIT  typing  of  variables  is  allowed  under  the  Force,  and  that  all 
implicitly  typed  variables  will  be  of  Force  variable  class  Private. 


Private 

Private  <type>  <variable  list> 

When  a  variable  is  declared  Private,  then  each  process  of  the  Force  maintains  its 
own  storage  space  for  that  variable,  even  though  the  variable  is  named  only  once  in 
the  main  program  listing. 

For  example: 

Private  DOUBLE  PRECISION  X  (100,100) 

Private  INTEGER  I,J,K 
Private  CHARACTER*80  STRING  1 

Such  variables  are  normally  used  for  arithmetic  temporaries  or  index  values  which 
have  distinct  values  for  each  process  of  the  Force. 


Private  Common 

A  Private  Common  variable  is  Private  in  the  sense  defined  above,  but  it  may  be 
Common  between  Force  modules.  This  declaration  would  appear,  with  the  variables 
specified  in  the  same  order,  in  each  of  the  modules  that  wished  to  include  the  Com¬ 
mon  variables.  The  syntax  is  as  follows: 

Private  Common  /<label>/  <type>  <variable  list> 

Unlike  FORTRAN  77,  Force  Common  variables  are  typed  within  the  same  state¬ 
ment  that  declares  them  to  be  Common.  They  must  also  be  dimensioned  in  that 
statement. 

For  example: 

Private  Common  /  MYCOPY  /  REAL  TIME(15) 

Private  Common  /  MYCOPY  /  INTEGER  POS,  SPEED 
Private  Common  /GRID/  COMPLEX  X,Y 

From  the  example,  we  can  see  that  variables  of  different  type  may  be  combined 
within  the  same  Common  block,  but  this  requires  different  declaration  statements. 
As  in  FORTRAN  77,  it  is  the  programmers  responsibility,  to  insure  that  all 
Force  modules  that  use  a  given  COMMON  block,  specify  the  variables  of  that 
COMMON  block  in  the  proper  order.  Also  note  that  arrays  are  dimensioned  on 
this  line.  FORTRAN  "blank  COMMON"  is  not  allowed. 


Shared 

When  a  variable  is  declared  Shared,  then  only  one  copy  of  that  variable  is  main¬ 
tained  by  all  of  the  processes  in  the  Force.  In  this  manner,  multiple  processes  may 
operate  on  and  communicate  through  shared  memory  locations.  Care  must  be  taken 
when  multiple  processes  try  to  modify  a  Shared  variable  all  at  once.  Normally,  one 
would  modify  a  Shared  variable  only  within  a  critical  section  of  the  program.  Reg¬ 
ular  FORTRAN  declarations  follow  the  Shared  keyword.  The  syntax  is  as  follows: 
Shared  <type>  <variable  list> 

For  example. 

Shared  INTEGER  I,  J 
Shared  REAL  A(800),  B(800) 

This  example  declares  I  and  J  to  be  shared  integers  and  declares  A  and  B  to  be  real 
vectors  of  the  specified  dimension. 


Shared  Common 

This  statement  has  the  following  syntax: 

Shared  Common  /<label>/  <type>  <variable  list> 

A  Shared  Common  variable  is  Shared  between  processes  as  defined  above.  In  addi¬ 
tion,  Shared  Common  variables  may  be  common  between  Force  modules.  That  is  to 
say,  different  processes  in  different  Force  modules  (subroutines)  all  have  access  to 
the  same  variable. 

Again,  as  in  Private  Common,  the  type  of  a  variable  is  declared  on  the  same  line 
with  the  Common  declaration.  Variables  of  different  type  may  be  combined  within 
the  same  Shared  Common  block,  but  this  will  require  the  use  of  several  declaration 
statements.  Once  again,  as  in  FORTRAN  77,  it  is  the  programmer’s  responsibility 
to  preserve  the  ordering  of  variables  in  a  Common  block.  An  example: 

Shared  Common  /PENPOS/  DOUBLE  PRECISION  X,Y 
Shared  Common  /PENCOL/  INTEGER  COLOR(8) 


Async 

This  statement  has  the  general  form: 

Async  <type>  cvariable  list> 

Asynchronous  variables  are  shared  between  processes;  that  is,  they  have  only  one 
instantiation  for  all  processes.  The  distinguishing  feature  of  an  Async  variable  is  its 
"full/empty"  state.  The  use  of  these  variables  is  governed  by  the  Produce,  Con¬ 
sume,  Copy,  Void ,  and  Isfull  macros  which  are  described  later.  Briefly,  an  asyn¬ 
chronous  variable  may  be  consumed  or  copied  only  if  it  is  "full,"  and  produced  only 
if  it  is  "empty."  Thus,  Async  variables  may  be  used  to  implement  data  based  syn¬ 
chronization. 


For  example,  the  following  Force  program  fragment  illustrates  the  use  of 
this  macro: 

Async  INTEGER  I 
Async  REAL  X,  Y,  Z 

End  Declarations 


Barrier 
Void  X 
End  barrier 


Produce  X  =  local_stuff 


Async  Common 

This  statement  has  the  general  form: 

Async  Common  /<label>/  <type>  cvariable  list> 

Async  Common  variables  have  all  the  properties  of  A  sync  variables  described  above. 
In  addition,  they  may  be  Common  between  Force  modules  that  include  this  designa¬ 
tion. 


II C.  Parallel  Execution 

Parallel  execution  can  be  specified  by  three  kinds  of  macro  constructs.  Two.  kinds 
are  related  to  the  DOALL  and  parallel  case  constructs.  The  two  constructs  are  similar  to 
the  extent  that  both  involve  segments  of  code  that  can  be  executed  in  any  order.  DOALL 
applies  to  independent  instances  of  a  code  body  for  different  index  values  as  in  loops. 
The  parallel  case  construct  applies  to  different  single  stream  code  blocks  which  are  mutu¬ 
ally  independent.  The  distribution  of  work  may  either  be  pre-scheduled  or  self-scheduled. 
The  third  macro  is  related  to  the  Askfor  monitor  that  has  been  origionally  proposed  by 
Lusk  and  Overbeek[5].  This  construct  provides  a  means  of  scheduling  the  execution  of  a 
body  of  a  sequential  code  which  may  require  a  dynamically  increasing  number  of  execu¬ 
tions,  as  in  recursive  algorithms.  An  initial  number  of  executions  of  the  Askfor  loop  body 
is  specified  and  this  number  may  be  increased  within  the  body  using  the  More  work 
macro. 

Pease 

This  statement  establishes  a  pre-scheduled  parallel  case  construct  which  starts  with 
either  of  the  following  constructs: 

Pease 

or 

Pease  on  <var> 

The  construct  consists  of  a  series  of  independent  sections  of  code,  each  of  which  is 
to  be  executed  by  a  single  process.  The  sections  are  delimited  by  a  Pease  ,  zero  or 
more  Usect ,  zero  or  more  Csect  and  an  End  pease  statements. 

The  construct  consists  of  a  series  of  independent  sections  of  code,  each  of  which  is 
to  be  executed  by  a  single  process.  The  sections  are  delimited  by  a  Pease  ,  zero  or 
more  Usect ,  zero  or  more  Csect  and  an  End  pease  statements. 

The  construct  assigns  its  own  private  integer  variable  unless  <var>  is  used  explicitly 
in  the  second  form  of  the  construct.  In  such  cases,  the  programmer  must  declare 
<vai>  as  a  Private  Integer  variable.  In  either  case,  the  execution  of  multiple  cases  is 
pre-scheduled  using  this  variable,  which  is  assigned  the  value  i  during  the  execution 
of  the  jth_case.  The  yth_case  will  be  executed  by  the  process  with  local_id  equal  to 
((/- 1)  mod  P)+l,  where  P  is  the  number  of  processes. 

If  there  are  more  processes  in  the  Force  than  there  are  code  sections  then  all  code 
sections  will  be  executed  simultaneously.  Otherwise  some  will  be  executed  sequen¬ 
tially  by  the  same  process.  Thus  care  must  be  taken  while  using  asynchronous  vari¬ 
ables  (producer/consumer)  within  a  Pease.  A  parallel  case  with  only  one  code  sec¬ 
tion  is  similar  to  a  barrier  in  that  the  code  is  executed  by  a  single  process,  but  differs 
in  that  no  synchronization  of  other  processes  occurs. 

There  are  slight  variations  in  the  implementation  of  the  parallel  case  construct.  An 
example  of  the  simplest  implementation  is  given  below.  Here  each  task  represents  a 
group  of  regular  (single  stream)  FORTRAN  77  instructions. 
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> 


Usect 

<taskB> 

Usect 
<task  C> 

End  pease 

If  any  of  the  single  stream  code  sections  arc  conditional,  the  Csect  statement  can  be 
used.  The  condition  is  built  into  the  Csect  construct.  An  example  when  all  code 
sections  are  conditional  is  given  below. 

Pease 

Csect  (<condition>) 

<task  A> 

Csect  (<condition>) 
ctask  B> 

Csect  (<condition>) 
ctask  C> 

End  pease 

Csect  and  Usect  can  both  appear  in  a  parallel  case  construct.  The  sections  on  Csect 
and  Usect  outline  the  variation  in  implementation  of  parallel  case  construct. 


Usect 

This  statement  separates  multiple  single  stream  code  sections  of  a  parallel  case. 
When  Csect  is  used  to  start  a  conditional  case  section  then  Usect  is  not  used  to 
separate  it  from  the  previous  code  section.  Also,  Usect  is  not  used  if  there  is  only 
one  code  section. 


Csect 

This  statement  begins  a  conditional  single  stream  code  section  of  a  parallel  case  and 
has  the  following  form: 

Csect  (<condition>) 

where,  <condition>  is  a  FORTRAN  condition  of  the  same  form  allowed  in  a  FOR¬ 
TRAN  IF  statement. 


End  pease 

The  pre-scheduled  parallel  case  construct  is  terminated  by  this  statement.  Note  that 
some  processes  may  proceed  past  this  point  while  portions  of  the  parallel  case  are 
still  being  executed. 


The  Scase  statement  is  an  alternative  to  Pease  in  writing  a  parallel  case  construct. 
When  a  parallel  case  is  initialized  by  the  statement 

Scase 

the  allocation  of  the  work  is  done  at  the  execution  time  rather  than  being  pre¬ 
scheduled.  A  process  receives  the  next  available  case  section  when  it  finishes  a  pre¬ 
viously  assigned  section.  The  other  aspects  of  a  self- scheduled  parallel  case  con¬ 
struct  are  the  same  as  the  pre-scheduled  parallel  case  construct,  except  that  it  is  ter¬ 
minated  by  an  End  scase  statement  instead  of  End  pease. 

In  contrast  to  the  Pease  construct,  process  synchronization  is  included  to  ensure  that 
two  instances  of  a  self-scheduled  construct,  either  a  parallel  case  or  a  parallel  DO 
loop,  are  not  being  executed  simultaneously. 


End  scase 

The  self-scheduled  parallel  case  construct  is  terminated  by  this  statement.  Although 
processes  may  proceed  past  this  point  while  portions  of  the  self-scheduled  parallel 
case  are  still  being  executed,  no  process  may  enter  another  self-scheduled  construct 
(parallel  case  or  loop)  or  re-enter  this  one  a  second  time  before  all  processes  have 
exited. 


Presched  DO 

A  pre-scheduled  parallel  loop  is  introduced  by  the  Presched  DO,  which  has  the  fol¬ 
lowing  form: 

Presched  DO  <n>  <i>=<il>,<i2>[,<i3>] 

This  statement  must  have  a  body  such  that  instances  of  the  body  for  different  values 
of  the  private  variable  <i>  are  independent  and  can  thus  be  executed  in  parallel. 
Pre-scheduling  partitions  different  values  of  <i>  evenly  over  processes  in  a  manner 
fixed  at  compile  time.  Pre-scheduled  loops  are  useful  when  the  execution  time  of 
the  loop  body  is  fairly  constant.  The  step  size  <i3>  is  optional  and  is  taken  as  one  if 
missing. 

The  parameters  <il>,  <i2>  and  <i3>  must  be  constants  or  expressions  yielding  an 
integer  value.  These  values  must  be  identical  for  all  processes  of  the  Force  (i.e.,  if 
Private  variables  are  in  the  expressions),  and  they  must  remain  fixed  during  execu¬ 
tion  of  the  loop.  The  parallel  IX)  constructs  do  not  nest  with  each  other,  however 
they  may  be  nested  (internally  or  externally)  with  normal  FORTRAN  DO  loops. 

An  example: 

Presched  DO  99  J=  1,M1 
C(J)  =  0.0 

99  End  presched  DO 

would  initialize  the  the  first  Ml  elements  of  the  vector  C  to  zeros.  Note  that  Ml 


and  the  vector  c  are  assumed  to  have  been  declared  Shared  or  Shared  Common. 
Also  note  that  no  process  synchronization  occurs  -  processes  may  enter  and  leave 
the  loop  asynchronously. 


<n>  End  presched  DO 

This  statement  terminates  the  body  of  a  pre-scheduled  DO  loop.  The  statement 
number  <n>  must  match  that  on  the  Presched  DO  statement. 


Selfsched  DO 

The  Selfsched  DO  statement  is  an  alternative  for  introducing  a  parallel  loop  and  it 
has  the  following  general  form: 

Selfsched  DO  <n>  <i>=<i  1  >,<i2>[,<i3>] 

The  behavior  of  the  Selfsched  DO  loop  is  the  same  as  that  of  a  Presched  DO  except 
that  the  allocation  of  the  work  is  done  at  execution  time.  A  process  receives  the 
next  unassigned  value  of  the  private  variable  <i>  when  it  finishes  its  previous  itera¬ 
tion.  This  tends  to  even  the  workload  over  processes  when  the  execution  time  of  the 
loop  varies  significandy  for  different  values  of  <i>.  The  parameters  <il>,  <i2>  and 
<i3>  must  be  constants  or  expressions  yielding  an  integer  value,  and  this  value 
should  remain  fixed  during  execution  of  the  loop.  The  implementation  generates  a 
Shared  temporary  variable  to  handle  the  shared  loop  index.  Synchronization  is  pro¬ 
vided  to  ensure  that  the  execution  of  different  instances  of  self-scheduled  loops  or 
cases  is  not  overlapped.  This  means  that  the  overhead  is  higher  for  self-scheduled 
loops  than  for  pre-scheduled  loops. 

As  was  the  case  with  pre-scheduled  loops,  the  parameters  <il>,  <i2>  and  <i3>  must 
be  constants  or  expressions  yielding  an  integer  value.  These  values  must  be  identi¬ 
cal  for  all  processes  of  the  Force,  and  remain  fixed  while  the  loop  is  executing.  The 
parallel  DO  constructs  do  not  nest  with  each  other,  however  they  may  be  nested 
(internally  or  externally)  with  normal  FORTRAN  DO  loops. 

For  example: 

Selfsched  DO  99  J=  1,M1 
C(J)  =  0.0 

IF  (J/7  .EQ.  VI. 0)  CALL  HARDWORK(C(J)) 

99  End  selfsched  DO 

would  initialize  the  the  first  Ml  elements  of  the  vector  C  to  zeros,  and  call  hard  work 
if  J  is  a  multiple  of  seven.  Note  that  Ml  and  the  vector  C  are  assumed  to  have  been 
declared  Shared  or  Shared  Common.  Also  note  that  processes  may  enter  the  loop 
before  all  have  arrived  and  may  leave  the  loop  before  all  have  finished,  but  no  pro¬ 
cess  may  enter  another  self-scheduled  loop,  or  re-enter  this  one  a  second  time,  until 
all  have  exited.  Processes  may  also  not  enter  a  subsequent  self-scheduled  case  con¬ 
struct  until  this  self-scheduled  construct  is  complete. 
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<n>  End  selfsched  DO 

This  statement  ends  the  body  of  the  self-scheduled  DO  with  statement  number  <n>. 


Pre2do 

Doubly  indexed  DO  loops  are  supported  as  separate  constructs  within  the  Force. 
Semantic  considerations  dictate  that  these  be  implemented  with  separate  constructs 
rather  than  to  allow  nesting  of  the  parallel  DO  loops. 

Pre2do  <n>  <i>=<il>,<i2>[,i3] ;  <j>=<jl>,<j2>[,<j3>] 

Like  single-index  parallel  DO  loops,  this  statement  must  have  a  body  in  which 
instances  of  the  body  for  different  pairs  of  values  of  the  private  indices  <i>  and  <j> 
are  independent.  Pre-scheduling  partitions  different  pairs  of  values  of  <i>  and  <j> 
evenly  over  processes  in  a  manner  fixed  at  compile  time.  Step  sizes  <i3>  and  <j3> 
are  optional,  and  they  are  taken  as  one  if  missing.  For  example: 

Pre2do  99  J=  1,LIM  ;  K=10,l,-1 
C(J,K)  =  A(J,K)  +  B(J,K) 

99  End  Presched  DO 

Note  that  LIM  and  the  vectors  A,  B,  and  C  are  assumed  to  have  been  declared 
Shared  or  Shared  Common.,  and  I  and  K  are  Private.  Again,  note  that  no  process 
synchronization  occurs  -  processes  may  enter  and  leave  the  loop  asynchronously. 


<n>  End  presched  DO 

This  statement  ends  the  body  of  the  doubly  indexed  pre-scheduled  DO  loop  as  well. 
<n>  must  match  the  <n>  given  in  the  Pre2do  statement. 


Self2do 

The  Self2do  statement  is  a  self-scheduled  version  of  the  doubly  indexed  DO  loop.  It 
has  the  following  form: 

Self2do  <n>  <i>=<i  1  >,<i2>[,<i3>];  <j>=<jl>,<j2>[,<j3>] 

Scheduling  of  the  indices  is  done  at  execution  time;  processes  receive  the  "next" 
pair  of  indices  available  when  they  are  ready  to  perform  an  iteration  of  the  doubly 
indexed  loop.  Self-scheduling  regulates  the  workload  among  processes  at  a  cost  of 
higher  synchronization  overhead.  When  loop  iterations  require  approximately  the 
same  amount  of  execution  time,  then  it  is  more  efficient  to  use  a  pre-scheduled  DO 
loop.  Once  again,  there  must  be  no  data  dependencies  between  loop  bodies  for  dif¬ 
ferent  <i>,  <j>  pairs;  this  is  the  programmers  responsibility. 

The  parameters  <il>  through  <i3>  and  <jl>  through  <j3>  must  be  integer  constants 
or  expressions,  which  remain  fixed  during  a  given  execution  of  the  Self2do  loop. 
Overlapping  executions  are  prevented  for  different  instances  of  doubly  indexed,  as 
well  as  singly  indexed,  self- scheduled  loops. 


An  example: 


I 


Self2do  100 1=  1,M1  ;  J=1,M1 
IF  a  -NE.  J)  THEN 
C(I,J)  =  0.0 
ELSE 

C(I,J)  =  DTAN(DOUBLE(J*PI/Ml)) 

END  IF 

100  End  selfsched  DO 

Processes  may  enter  the  loop  before  all  have  arrived  and  leave  before  all  have 
finished,  but  no  process  may  enter  a  second  instance  of  a  self-scheduled  loop  before 
all  have  exited. 


<n>  End  selfsched  DO 

This  statement  terminates  the  body  of  a  doubly  indexed  self-scheduled  DO  loop  as 
well.  The  statement  number  <n>  must  match  that  on  the  Selfido  statement. 


Askfor  DO 

The  Askfor  Do  statement  is  a  general  means  of  scheduling  the  execution  of  a  set  of 
parallel  work  that  may  dynamically  increase,  as  in  the  case  of  recursive  algorithms. 
It  has  the  following  form: 

Askfor  DO  <n>  init :  <i> 

This  statement  must  have  a  body  that  will  be  self-scheduled  to  be  executed  by 
processes  of  the  Force  <i>  times.  A  typical  body  will  start  with  a  Critical  section 
that  will  coordinate  the  acquiring  of  some  shared  data  representing  a  new  task  into 
local  variables,  such  that  instances  of  the  body  for  different  values  of  the  local  vari¬ 
ables  are  independent  and  thus  can  be  executed  in  parallel.  The  body  of  this  con¬ 
struct  may  also  contain  a  More  work  statement  that  will  increase  the  number  of 
times  the  Askfor  body  will  be  executed.  Once  the  execution  of  the  Askfor  Do  starts, 
processes  will  not  exit  from  the  construct  until  no  more  work  is  left  and  all  the 
processes  of  the  Force  are  finished  with  their  scheduled  work  so  that  no  new  work 
can  be  generated. 

More  work  <val> 

This  statement  can  optionally  appear  in  the  body  of  the  Askfor  DO  construct.  It  will 
cause  the  body  to  be  executed  <val>  more  times.  Typically  it  is  included  in  a  Criti¬ 
cal  section  which  adds  a  new  task  to  a  shared  data  structure  to  be  processed  by  sub¬ 
sequent  execution  of  the  body  of  the  Askfor  Do. 
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Consider  the  following  example  which  starts  with  the  root  node  of  a  subtree  of  a 
binary  tree  and  extracts  the  leaves  of  this  subtree  by  placing  a  -1  in  the  right  pointer 
of  the  leaf  node.  All  leaf  nodes  of  the  tree  have  -1  in  their  left  pointer.  Execution 
begins  with  a  list  of  nodes  to  be  examined  having  its  first  element  Nodes(l)  set  to 
the  number  of  the  root  node  of  the  subtree,  and  the  pointer  to  the  end  of  this  list, 
Top,  set  to  one. 

Askfor  DO  100  Init :  1 
Critical  indx 
I  =  Nodes(Top) 

Top  =  Top  +  1 
End  critical 

10  If  ( L(I).EQ.  -1)  Then 

R(I)  =  -1 

Else 

Critical  indx 
Top  =  Top  +  1 
Nodes(Top)  =  L(I) 

More  work  1 
End  critical 

1=  Rtf) 

Go  To  10 
Endif 

100  End  askfor  DO 

Originally  the  tree  is  represented  in  the  two  Shared  integer  arrays  L  and  R,  where 
L(I)  and  Rtf)  are  the  left  and  right  pointers  respectively,  for  node  I.  A  leaf  node  has 
a  value  of  -1  in  its  left  node.  Top  is  used  as  a  pointer  to  the  next  available  node  to  be 
processed.  The  Askfor  Do  initially  has  one  node  to  process.  A  process  which  is 
scheduled  to  execute  the  Askfor  DO  will  check  to  see  if  the  left  pointer  of  this  node 
indicates  a  leaf.  If  it  does,  it  will  set  the  right  node  pointer  of  this  node  to  -1  and  go 
back  to  see  if  there  is  more  work.  If  not,  the  process  will  add  the  left  node  of  the 
subtree  in  the  nodes  to  be  handled  by  the  next  available  process  and  will  go  back  to 
check  the  nodes  of  the  right  branch. 


<n>  End  askfor  DO 

This  statement  terminates  the  body  of  an  Askfor  Do  loop.  The  statement  number 
<n>  must  match  that  on  the  Askfor  Do  statement. 


HD,  Synchronization 


Barrier 

This  statement  must  be  executed  by  all  processes  of  the  Force.  When  all  have 
reached  the  Barrier  statement,  a  single  process  will  execute  the  "body"  of  the  Bar¬ 
rier,  the  body  is  defined  as  the  block  of  code  between  the  Barrier  and  the  End  bar¬ 
rier  statements.  After  the  body  has  been  executed  by  a  single  process,  all  the 
processes  of  the  Force  will  resume  execution  after  the  End  barrier  statement,  and 
they  will  have  been  synchronized.  Note,  it  is  not  necessary  for  the  Barrier  to  have  a 
body  at  all,  but  End  barrier  is  always  required. 

Example: 

Barrier 
X  =  X+  1 
End  barrier 

Barrier  synchronization  will  cause  all  the  processes  to  wait  at  the  first  Barrier  state¬ 
ment  until  the  last  one  arrives.  A  single  process  will  then  execute  the  body  of  the 
Barrier  construct,  in  this  case  incrementing  X  by  one.  After  the  body  has  been  exe¬ 
cuted,  then  all  processes  continue  at  once  with  statements  following  the  End  barrier 
statement. 

It  is  the  programmers  responsibility  to  place  Barriers  where  they  make  sense.  To 
place  a  Barrier  inside  a  Pease  causes  a  deadlock,  since  not  all  processes  will  reach 
the  Barrier,  and  those  that  did  would  hang.  Likewise,  Barriers  within  Self  or  Pre¬ 
scheduled  DO  loops  should  be  avoided.  They  would  also  deadlock,  unless  the 
number  of  processes  divides  evenly  into  the  number  of  loop  iterations. 


End  barrier 

Paired  with  the  previous  statement,  this  one  delimits  a  section  of  code  executed  by  a 
single  instruction  stream.  Synchronized  parallel  execution  begins  after  this  state¬ 
ment. 


Critical 

Mutual  exclusion  can  be  accomplished  by  named  critical  sections  using  the  Critical 
construct,  which  has  the  following  form: 

Critical  <lock-var> 

The  critical  section  is  ended  by  the  End  critical  statement.  Use  of  a  Critical  section 
guarantees  that  only  one  process  will  be  executing  any  block  of  code  nested 
between  the  Critical  and  End  Critical  statements  of  critical  sections  with  the  same 
<lock-var>. 


The  user  must  declare  <lock-var>  as  a  Shared  variable,  preferably  of  type  LOGI¬ 
CAL.  This  variable  is  used  as  a  lock  and  should  contain  no  other  value.  Two  or 
more  critical  sections  may  share  the  same  <Iock-var>.  Two  critical  sections  on  the 
same  <lock-var>  cannot  execute  simultaneously.  If  one  wishes  to  coordinate  activi¬ 
ties  between  Force  modules,  then  the  <lock-var>  may  be  a  Shared  Common  vari¬ 
able,  declared  in  those  Force  modules  that  wish  to  use  it.  For  example: 

Shared  Common  /IO/  LOGICAL  WRITER 
End  Declarations 

Critical  WRITER 
WRTTE(6,10)  ME 

1 0  FORM  AT(  1  X,"Me  =  ",I3) 

End  critical 


End  critical 

This  statement  is  paired  with  the  nearest  unmatched  preceding  Critical  statement  to 
delimit  a  critical  section.  Nested  critical  sections  are  allowed;  however  there  is  no 
automatic  deadlock  prevention  employed  if  critical  sections  are  improperly  nested. 


Produce 

Produce  <async  var>  =  <expr> 

If  the  asynchronous  variable  <async  var>  is  "empty,”  Produce  assigns  the  value  of 
the  expression  <expi>  to  <async  var>  and  marks  <async  var>  as  "full."  If 
<async  var>  is  not  "empty,"  the  process  currently  executing  Produce  will  wait  until 
<async  var>  becomes  "empty"  and  then  make  the  assignment  and  mark  <async  var> 
as  "full."  These  actions  occur  atomically.  The  variable  <async  var>  must  have  been 
declared  as  an  asynchronous  variable  using  the  Async  statement. 
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Example: 


Private  REAL  YY 
Async  REAL  XX 


End  Declarations 


Barrier 
Void  XX 
End  Barrier 


YY  =  7.0*COS(A+B) 
Produce  XX  =  YY  +  3 


Consume 

Consume  <async  var>  into  <var2> 

If  the  asynchronous  variable  is  "full,"  then  this  macro  routine  will  assign  the  value 
of  <async  var>  to  <var2>  and  mark  <async  var>  as  "empty."  If  it  is  not  "full,"  Con¬ 
sume  will  wait  until  <async  var>  becomes  "full,"  store  its  value,  and  mark  it  as 
"empty."  If  multiple  processes  are  executing  a  Consume  statement  on  the  same 
<async  vai>,  and  if  the  <async  var>  is  "full,"  then  only  one  consumer  process  will 
succeed.  The  others  will  have  to  wait  until  <async  var>  is  set  "full"  again  (by  a 
Produce)  before  they  will  have  a  chance  to  succeed.  The  variable  <async  var>  must 
have  been  declared  as  an  asynchronous  variable.  In  most  applications,  <var2>  will 
be  Private.  For  example: 

Consume  XX  into  YY 


Copy 

Copy  <async  var>  into  <var2> 

This  macro  routine  will  store  the  value  of  the  asynchronous  variable  <async  var> 
into  <var2>  if  <async  var>  is  "full,"  without  changing  the  variable’s  status.  If  the 
variable  is  "empty,"  then  Copy  will  wait  until  <async  var>  becomes  "full,"  and  then 
return  its  value,  and  leave  it  "full."  The  variable  <async  var>  must  have  been 
declared  as  an  asynchronous  variable.  For  example: 

Copy  XX  into  YY 
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Void  <async  var> 

This  macro  will  unconditionally  mark  the  asynchronous  variable  <async  var>  as 
"empty."  The  variable  <async  var>  must  have  been  declared  as  an  asynchronous 
variable  by  the  Async  statement.  Note,  asynchronous  variables  are  not  necessarily 
"empty"  when  declared;  normally  one  would  first  Void  an  asynchronous  variable 
before  using  it  in  a  producer/consumer  macro.  For  example: 

Void  XX 


Isfull  (  <async  var>  ) 

This  macro  "function"  will  return  the  logical  state  of  the  asynchronous  variable 
<var>,  with  TRUE  corresponding  to  "full"  and  FALSE  indicating  that  the  asyn¬ 
chronous  variable  is  "empty."  It  may  be  used  anywhere  that  a  FORTRAN  logical 
function  would  be  used.  The  variable  <async  var>  must  have  been  declared  as  an 
asynchronous  variable  by  the  Async  statement.  For  example: 


Async  REAL  XX 
Private  REAL  MYCOPY 
End  declarations 


IF(  Isfull(XX) )  THEN 
Consume  XX  into  MYCOPY 
ELSE 

<do  something  else> 

END  IF 


m.  Restrictions  on  the  Force  Macros 

The  Force  macro  implementations  on  the  Flex/32,  the  Encore  Multimax,  the 

Sequent  Balance  and  the  Alliant  FX/Series  adhere  to  almost  all  of  the  FORTRAN  stan¬ 
dards  and  elements  of  style  except  for  the  following  points: 

1.  Barrier,  Forcecall,  and  Join,  and  all  of  the  macros  that  specify  parallel  execution 
must  be  executed  by  all  the  processes  executing  the  parallel  program.  Skipping  over 
these  constructs  by  a  fraction  of  the  processes  may  cause  an  indefinite  hang  up  and 
unexpected  results  may  be  obtained. 

2.  Branching  into  or  out  of  a  body  of  a  Force  construct  is  not  allowed  and  may  not  be 
detected  by  either  the  Force  preprocessor  or  the  FORTRAN  compiler  and  will  lead 
to  unexpected  results. 

3.  Except  for  the  statements  closing  parallel  DO  loops,  Force  statements  should  not  be 
numbered,  and  numbered  Force  statements  will  not  be  recognized  by  the  preproces¬ 
sor  and  will  produce  FORTRAN  syntax  errors. 

4.  The  Force  preprocessor  may  generate  subroutine  names  using  a  variation  on  the 
name  of  a  given  Force  module.  For  this  reason,  the  first  five  characters  of  the  name 
of  a  Force  module  must  uniquely  identify  that  module. 

5.  Asynchronous  variables  cannot  be  passed  as  parameters  to  other  modules  or  subrou¬ 
tines  and  be  expected  to  behave  asynchronously.  Async  common  must  be  used  for 
this  purpose. 

6.  FORTRAN  BLOCK  DATA  is  currently  not  supported  and  thus  Shared  and  Shared 
Common  variables  cannot  be  initialized  statically  at  compile  time. 

7.  The  FORTRAN  DATA  statement  can  only  be  used  to  initialize  Private  variables. 

8.  Finally,  it  should  be  noted  that  the  line  numbers  which  are  referenced  by  the  error 
messages  resulting  from  using  the  "force"  command  refer  to  the  „f  files  and  not  to 
the  .frc  files. 
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IV.  How  to  Invoke  the  Force 

This  section  will  discuss  the  UNIX  shell  scripts,  force,  forcerun,  and  preforce, 
used  to  invoke  the  Force.  Implementations  on  four  machines  will  be  considered:  the 
Flex/32  (Flexible  Computer  Corp.),  the  Multimax  (Encore  Computer  Corp.)  and  the  Bal¬ 
ance  (Sequent  Computer  Corp.),  and  the  Alliant  FX/Series  (Alliant  Computer  Systems 
Corp.). 

force  is  the  shell  command  that  is  used  to  preprocess,  compile  and  link  Force  source 
programs.  The  force  command  takes  an  argument  list  of  files  and  flags  and  produces  a 
parallel  executable  output  program.  We  will  adopt  the  convention  that  Force  source  files 
have  a  filename  ending  with  a  .frc  extension.  Files  in  the  argument  list  with  a  .frc  exten¬ 
sion  will  first  be  preprocessed  to  expand  the  Force  macros.  The  resulting  files  along  with 
the  Force  driver  program  and  any  other  files  specified  will  then  be  compiled  and  linked. 

The  forcerun  command  is  used  to  execute  a  Force  program,  forcerun  also 
specifies  the  number  of  component  processes  to  be  used  by  the  Force  program  during  that 
run.  forcerun  takes  two  arguments:  the  first  is  the  name  of  the  Force  executable  file, 
and  the  second  is  an  integer  number  representing  the  number  of  processes  (processors  on 
Flex/32)  to  be  used  for  that  run. 

The  preforce  command  performs  only  the  preprocessing  steps,  producing  FOR¬ 
TRAN  .f  files  from  Force  .frc  files  specified  in  the  argument  list.  The  preforce  shell 
script  is  intended  as  a  debugging  convenience,  as  the  f77  compiler  used  by  the  force 
command  will  give  line  numbers  referring  to  the  .f  file  when  referencing  errors. 

The  force,  forcerun,  and  preforce  commands  are  executable  from  any  directory, 
and  we  recommend  that  frequent  users  of  the  Force  include  aliases  for  these  shell  scripts 
in  their  .cshrc  files  or  links  to  them  in  their  own  bins.  All  three  commands,  when 
invoked  with  no  arguments,  will  print  a  help  message  illustrating  their  use.  The  sections 
below  will  describe  features  and  options  of  the  commands  that  are  specific  to  the  Flex/32, 
the  Multimax,  the  Balance,  or  the  Alliant. 

IV  A.  Flex/32  (Flexible  Computer  Corp.) 

The  shell  scripts,  force,  forcerun,  and  preforce,  may  be  found  in  the 
/usr/local/force  directory  on  NASA/Langley’s  Flex/32. 

On  the  Flex/32,  the  force  command  uses  Flexible’s  cf77  compiling  command,  and 
the  Force  preprocessor  will  generate  .cf  files  from  .frc  files  in  the  argument  list,  force 
will  accept  all  options  associated  with  cf77.  The  Greenhills  compiler  is  automatically 
selected  by  force,  and  cfg.18  is  used  as  a  default  if  no  other  configuration  file  is  specified. 
The  syntax  is  as  follows, 
force  [cf77  options]  <filename  list> 

Some  examples: 

force  matmul.frc  init.frc  subs.f 

force  -o  test.exe  -h  cfg.8  test  1. frc  test2.frc 

The  forcerun  command  is  used  to  execute  a  Force  program.  It  has  the  following 
syntax: 
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i  forcerun  <executable  file>  <number  of  processors> 

For  example: 

forcerun  test.exe  18 

i 

[  On  the  Flex/32,  preforce  invokes  both  the  Flexible  and  Force  preprocessors,  pre- 

i  force  accepts  files  ending  with  an  .frc  or  .cf  extension  and  creates  the  .f  FORTRAN 

I  equivalents.  There  are  two  options.  The  -cf  option  invokes  only  the  Force  preprocessor, 

[  creating  .cf  files  from  .frc  source  files.  The  -a  option  creates  "all  files":  .cf,  .f,  .su.f,  .sh.f, 

>  and  .CF.l.  When  used  without  options,  preforce  will  create  only  .f  files.  The  syntax  is  as 

follows: 

l  preforce  <filename>  [filename,...] 

An  example: 

preforce  thisfile.frc 


IV  B.  Multimax  (Encore  Computer  Corp.) 


4 
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The  shell  scripts,  force,  forcerun,  and  preforce,  may  be  found  in  the  1' 

/usr/local/unsupp/force  directory  on  the  University  of  Colorado  at  Boulder’s  Multimax  ■'/> 

(encore).  For  the  Multimax,  force  preprocesses  .frc  files  in  the  argument  list  producing 

A  files,  and  then  uses  the  standard  f77  compiler,  force  will  accept  all  options  associated  A’ 

with  f77.  The  syntax  is  as  follows. 
force  [f77  options]  <filename  list> 


An  example: 

force  -o  matmul.exe  matmul.frc  x.f 


The  forcerun  command  is  used  to  execute  a  Force  program.  It  has  the  following 
syntax: 

forcerun  executable  file>  cnumber  of  processes> 

For  example: 


forcerun  matmul.exe  8 

The  preforce  command  performs  only  the  preprocessing  steps,  producing  FOR¬ 
TRAN  .f  files  from  Force  .frc  input  files.  The  syntax  is  as  follows: 
preforce  <filename>  [filename,...] 

An  example: 

preforce  matmul.frc 


IV  C.  Balance  (Sequent  Computer  Corp.) 

The  shell  scripts,  force,  forcerun  and  preforce,  may  be  found  in  the 
/usr/local/unsupp/force  directory  on  the  University  of  Colorado  at  Boulder's  Sequent 
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(tramp).  For  the  Sequent,  force  preprocesses  .frc  files  in  the  argument  list  producing  .f 
files,  and  then  uses  the  standard  FORTRAN  (Silicon  Valley)  compiler,  force  will  accept 
all  options  associated  with  the  (NS32000  series  Silicon  Valley)  FORTRAN  compiler. 
The  syntax  is  as  follows, 
force  [fortran  options]  <filename  list> 

An  example: 

force  -o  matmul.exe  matmul.frc  x.f 

The  forcerun  command  is  used  to  execute  a  Force  program.  It  has  the  following 
syntax: 

forcerun  executable  file>  <number  of  processes> 

For  example: 

forcerun  matmul.exe  8 

The  preforce  command  performs  only  the  preprocessing  steps,  producing  FOR¬ 
TRAN  .f  files  from  Force  .frc  input  files.  The  syntax  is  as  follows: 
preforce  <filename>  [filename...] 

An  example: 

preforce  matmul.frc 


IV  D.  Alliant  FX/Series  (Alliant  Computer  Systems  Corp.) 

The  shell  scripts,  force,  forcerun,  and  preforce,  may  be  found  in  the 
/users/std_417/benten/latest/new  directory  on  the  University  of  Colorado  at  Boulder’s 
Alliant  FX/8  (cerberus).  For  the  Alliant,  force  preprocesses  .frc  files  in  the  argument  list 
producing  .f  files,  and  then  uses  the  FX/Fortran  compiler,  force  will  accept  all  options 
associated  with  FX/Fortran  compiler,  except  that  it  ignores  the  concunrency  option 
invoked  either  locally  or  globally,  force  by  itself  invokes  global  optimization  and  vector- 
ization  options.  If  suppression  of  vectorization  is  desired  NOVECTOR  directive  should 
be  used  inside  the  source  program.  The  syntax  is  as  follows, 
force  [FX/Fortran  options]  <filename  list> 

Some  examples: 

force  -o  matmul.exe  matmul.frc  sub.f 
force  -o  test.exe  -DAS  test  1. frc  test2.frc  tl.f 

The  forcerun  command  is  used  to  execute  a  Force  program.  It  has  the  following 
syntax: 

forcerun  executable  file>  <number  of  processes> 

For  example: 


forcerun  matmul.exe  4 


V.  Sample  Program  Listing 


************************************************************************ 

*  Force  demo  program 

*  This  program  normalizes  a  square  matrix  by  its  largest  element. 

*  An  external  Force  module,  INTMAT,  is  called  to  initialize  the 

*  matrix.  Another  Force  module,  OUTMAT,  is  called  to  print  the 

*  final  matrix. 

************************************************************************ 


Force  DEMO  of  NP  ident  ME 
Private  REAL  PMAX,  TEM 
Private  INTEGER  INDEX 
Shared  REAL  X(100,100) 

Async  REAL  ALLMAX 
Extemf  INTMAT 
End  declarations 

C  INTMAT  is  an  external  subroutine  that  will  will  initialize  the  matrix. 
Forcecall  INTMAT(X.lOO) 

C  Now  we  must  search  the  matrix  for  its  greatest  element... 

C  ALLMAX  holds  the  currunt  maximum  value 

C  Initialize  ALLMAX 

Barrier 

Void  ALLMAX 
Produce  ALLMAX  =  0 
End  barrier 

PMAX  =  0 

C  Preschedule  rows  of  X  among  processors... 

C  Each  processor  finds  the  maximum  of  its  row  in  the  inner  loop. 

Presched  do  100  1=1,100 
DO  200  j=l,l00 
TEM  =  ABS(X(I,J)) 

IF  (TEM  .GT.  PMAX)  PMAX  =  TEM 
200  CONTINUE 

100  End  presched  do 

C  The  processors  communicate  to  find  the  overall  max  of  their  local  max  vals. 
Consume  ALLMAX  into  TEM 
IF  (PMAX  .GT.  TEM)  TEM  =  PMAX 
Produce  ALLMAX  =  TEM 


C  Synchronize ... 

Barrier 
End  Barrier 

Copy  ALLMAX  into  PMAX 
IF  (PMAX  .GT.  0)  THEN 

C  Normalize  the  matrix,  dividing  the  labor  on  the  outer  loop. 
Presched  do  300 1=1,100 
DO  400  J=l,100 
X(I,J)=X(U)  /  PMAX 
400  CONTINUE 
300  End  presched  do 

Barrier 
End  barrier 

END  IF 

C  OUTMAT  will  perform  sequential  i/o... 

Pease  on  INDEX 
Call  OUTMAT(X,100) 

End  pease 

Join 

END 

SUBROUTINE  OUTMAT(X.N) 

INTEGER  N,  INDEX 
REAL  X(N,N) 

DO  10  1=1, N 
DO  10  J=1,N 

10  write(6,*)  I,  J,  X(I,J) 

RETURN 

END 


************************************************************************ 

*  Assume  that  the  next  program  listing  is  in  a  separate  file. 
************************************************************************ 

Forcesub  INTMAT(MAT.N)  of  NP  ident  ME 
C  This  parallel  subroutine  will  initialize  the  matrix  MAT 
C  to  a  "test"  value. 


INTEGER  N 

REAL  MAT(N,N),  GEN 

End  declarations 

Presched  do  20  1=1, N 
DO  30  J=1,N 

C  The  sequential  function  GEN  is  used  to  generate  values 
MAT(I,J)  =  GEN(I,J) 

30  CONTINUE 

20  End  presched  do 

RETURN 

END 

REAL  FUNCTION  GEN(I,J) 

C  0.0  <  GEN  <=  1000.0 

INTEGER  I,J 
IF  ((I+J)  .GE.  1)  THEN 
GEN  =  1000.0  /(I+J) 

ELSE 

GEN  =  1000.0 
END  IF 
RETURN 
END 

********^********%****************************************************** 

************************************************************************ 
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